Search CORE

185 research outputs found

Algorithms for Differentially Private Multi-Armed Bandits

Author: Dimitrakakis Christos
Tossou Aristide
Publication venue
Publication date: 27/11/2015
Field of study

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist

(\epsilon, \delta)

differentially private variants of Upper Confidence Bound algorithms which have optimal regret,

O(\epsilon^{-1} + \log T)

. This is a significant improvement over previous results, which only achieve poly-log regret

O(\epsilon^{-2} \log^{2} T)

, because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL Descartes

Chalmers Research

Chalmers Publication Library

Hal-Diderot

Association for the Advancement of Artificial Intelligence: AAAI Publications

Probabilistic inverse reinforcement learning in unknown environments

Author: Dimitrakakis Christos
Tossou Aristide
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

Phoneme and sentence-level ensembles for speech recognition

Author: Bengio Samy
Dimitrakakis Christos
Publication venue
Publication date: 01/01/2011
Field of study

We address the question of whether and how boosting and bagging can be used for speech recognition. In order to do this, we compare two different boosting schemes, one at the phoneme level and one at the utterance level, with a phoneme-level bagging scheme. We control for many parameters and other choices, such as the state inference scheme used. In an unbiased experiment, we clearly show that the gain of boosting methods compared to a single hidden Markov model is in all cases only marginal, while bagging significantly outperforms all other methods. We thus conclude that bagging methods, which have so far been overlooked in favour of boosting, should be examined more closely as a potentially useful ensemble learning technique for speech recognition

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Chalmers Research

Hochschulschriftenserver - Universität Frankfurt am Main

Generalised Entropy MDPs and Minimax Regret

Author: Androulakis Emmanouil G.
Dimitrakakis Christos
Publication venue
Publication date: 01/01/2014
Field of study

Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.Comment: 7 pages, NIPS workshop "From bad models to good policies

arXiv.org e-Print Archive

Chalmers Research

Expected loss analysis of thresholded authentication protocols in noisy conditions

Author: Dimitrakakis Christos
Mitrokotsa Aikaterini
Vaudenay Serge
Publication venue
Publication date: 01/09/2010
Field of study

A number of authentication protocols have been proposed recently, where at least some part of the authentication is performed during a phase, lasting

n

rounds, with no error correction. This requires assigning an acceptable threshold for the number of detected errors. This paper describes a framework enabling an expected loss analysis for all the protocols in this family. Furthermore, computationally simple methods to obtain nearly optimal value of the threshold, as well as for the number of rounds is suggested. Finally, a method to adaptively select both the number of rounds and the threshold is proposed.Comment: 17 pages, 2 figures; draf

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Cover Tree Bayesian Reinforcement Learning

Author: Blekas Konstantinos
Dimitrakakis Christos
Tziortziotis Nikolaos
Publication venue
Publication date: 08/12/2013
Field of study

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library